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The coupling space of perceptrons with continuous as well as with binary weights gets partitioned 
into a disordered multifractal by a set of p = jN random input patterns. The multifractal spectrum 
f(ct) can be calculated analytically using the replica formalism. The storage capacity and the 
generalization behaviour of the perceptron are shown to be related to properties of f(a) which are 
£h ■ correctly described within the replica symmetric ansatz. Replica symmetry breaking is interpreted 

geometrically as a transition from percolating to non-percolating cells. The existence of empty cells 
gives rise to singularities in the multifractal spectrum. The analytical results for binary couplings 
are corroborated by numerical studies. 
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C/j 1 I. INTRODUCTION 

• i-H ' 

! ! , Simple networks of formal neurons with emergent properties for information processing have been discussed within 
the framework of statistical mechanics for meanwhile more than 10 years. In particular the simplest case of a feed- 
forward neural network, the single-layer perceptron, has been analyzed from various points of view and with respect 
to rather different properties in numerous papers. This is mainly due to the fact that the storage as well as the 
generalization abilities of this network can be concisely described using the phase space formalism introduced by 
Elizabeth Gardner |j| . Part of these investigations are summarized in recent reviews . 
£j ■ On the background of an ever growing body of investigations aiming at more and more special aspects of this system 
it seems appropriate to look for an unifying framework that allows to characterize the various properties in a coherent 
fashion. In the present paper we show that the geometrical structure of the coupling space of the perceptron shattered 
by a random set of inputs offers such a possibility. In fact the statistical properties of the partition of the coupling 
^vq ■ space into cells corresponding to different output sequences can be quantitatively characterized using methods from 
Q\ ' the theory of multifractals. With the help of the replica trick the multifractal spectrum can be calculated explicitly. 

Many of the relevant properties of the perceptron such as the storage capacity, the typical volume of the version space 
OO ■ and the generalization ability are closely related to special properties of this multifractal spectrum. As a result the 
'. relations between different investigations become more transparent. 

The idea to characterize the perceptron by the distribution of cells in coupling space induced by the inputs is rather 
old. It is already the basis of the classical determination of the storage capacity by Cover Q] and is frequently used 
in studies of information processing in mathematical statistics and computer science (see, e.g., Q). Its qualitative 
appeal within the framework of statistical mechanics was emphasized by Derrida et al. ^| who, however, seem not 
to have realized that the relevant quantities could in fact be calculated. This became clear only after the work of 
Monasson and O'Kane Q] characterizing the distribution of internal representations in the reversed wedge perceptron. 
Meanwhile these investigation were extended to the case of multi-layer networks and have produced several new results 
||[)| . But also for the simple perceptron this formalism offers the possibility of a systematic and coherent description 
clarifying several delicate points of former investigations. In the present paper we present a detailed analysis of the 
perceptron from this point of view. Some of the results were already published in [ p)| . 

The paper is organized as follows. In section II we present the general formalism of multifractals in its application 
to neural networks. Section III contains the analysis of the spherical perceptron, in section IV the Ising perceptron is 
discussed. A summary is given in the last section. 
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II. GENERAL FORMALISM 



In this paper we are going to analyse the coupling space of simple perceptrons. These are defined by the relation 

a = sgn(J£) = sgn(^ J 4 &). (1) 

i 

between N input bits £j = ±1, i — 1, . . . , N, and a single output a — ±1. We are interested in the thermodynamic 
limit N — > oo. The coupling vector J £ M N is model-dependent: For the spherical perceptron the only condition is 
the normalization of this vector to y/~N, in the case of the Ising perceptron it has binary components Jj = ±1. 

We choose p — ■yN random independent and identically distributed input patterns £ M e M N , \i = l,...,p. The 
hyper-plane orthogonal to each of these patterns cuts the coupling space into two parts according to the two possible 
outputs . The p patterns therefore generate a random partition of the coupling space into 2 P (possibly empty) cells 

C(K}m=i,...,p) = {J; ^ = sgn(J^) V M } (2) 
labeled by the TP output sequences a — {c^ 1 } (fig.l). 




FIG. 1. Symbolic representation of the random partition of a spherical coupling space for iV = 3 and p = 4. The figure 
shows the existence of a "mirror cell" corresponding to the symmetry of (^) under the transformation (J, <r) (— J, —tr). 

The relative cell size P(<x) = V(cr)/ ^ T V(t) gives the probability for generating the output er for a given input 
sequence ^ with a coupling vector J drawn at random from a uniform distribution over the whole space of couplings. 
The natural scale of this quantity in the thermodynamic limit is e = 2~ N . For the Ising perceptron this corresponds 
to a cell containing just a single coupling vector. It is convenient to characterize the cell sizes by the crowding index 
a(cr) defined by 

P{tr) = e a{(7) . (3) 

As discussed nicely in the Derrida part of || the storage and generalization properties of the perceptron are coded in 
the distribution of cell sizes defined by 



f(a) = lim logV 6(a - a(a)) . (4) 

N^oo JV log 2 * — ' 
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To calculate this quantity within the framework of statistical mechanics one uses the formal analogy of f(a) with 
the micro-canonical entropy of the spin system er with Hamiltonian a(cr). It can hence be determined from the 
corresponding free energy 

ri,) = - to ^<<log£exp(- g log2a( CT )))) = _ ^ _j_ ({u>g £p, (<r)>) (5) 

CT (T 

via Legendre-transformation with respect to the inverse temperature q 

f(a) = mm[aq - r(q)] . (6) 



This procedure fnj is very similar to the so-called thermodynamic formalism in the theory of multifractals |TT 12 
where the multifractal spectrum f(a) is introduced to characterize a probability measure by the moments 

(P«)=£PV)=e T W. (7) 
er 

In this connection r{q) is called the mass exponent. The only new feature here is the additional average over the 
random inputs er represented by ((. . .)) in The application of multifractal techniques to the theory of neural 
networks was initiated by Monasson and O'Kane in their study of the distribution of internal representations of 
multi-layer neural networks 0. 

To perform the analysis for the perceptron we start with the definition of the cell size 

P(ct) = J dfi(3) f[ 6(-±=a»Je) (8) 

using the Heaviside step function 8(x). The integral measure d/i(J) ensures that the total volume is normalized to 1. 

In the thermodynamic limit we expect both r and / to become self-averaging, and we can therefore calculate the 
mass exponent (^) by using the replica trick introducing n identical replicas numbered a — 1, ... ,n to perform the 
average over the quenched patterns. Moreover, we introduce a second replica index a = 1, . . . , q in order to represent 
the q-th power of P in (^) assuming as usual that the result can be meaningfully continued to real values of q [13|. 
Introducing integral representations for the Heaviside function we arrive at a replicated partition function given by 



=«E/nw)fnf/nf 

J 1 n. rv ■ ii n rv * 1 1 n rv v 



expjz E *r ( A r - — < -r^)))) ■ en 



The average over the quenched patterns £^ can now be easily done. To disentangle the remaining integrals we 
introduce the order parameters 



<ab 



\T. j rjf (io) 



as the overlap of two coupling vectors, and their conjugates . The spherical as well as the Ising constraints 
restrict the self-overlap Q"" to 1. The other values of the order parameter matrices are obviously invariant under 
simultaneous commutations of a <-> b and a <-> (3. We then find 



an) = f n 

"\pi 

— (ii) 



2ir/N 

(a.a)<(b,l3) 1 



exp < N 



E Qaf&f + 7 Go (Q a J) + log G 1 (Q a J) 

(a,a)<(bJ3) 
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with 



go(q: 



" a, a v a, a v {ff a } I a , a a,b,a,p 



X X 



* ab 



(a,a)<(6,/3) 



1/N 



(12) 



(a, a) < (6, /3) denotes either a<bova~b,a<(3 and counts the elements above the main diagonal in the order 
parameter matrices. The integrals over the order parameters in ( |ll|) can be done using the saddle point method. 

To find the correct saddle point is in general a very difficult task. A simple ansatz is the replica symmetric one. 
For the present situation it is important to note that the output sequences {cr°} carry only one replica index. The 
typical overlap of two coupling vectors within one cell (same output sequence {<r°}) will hence in general be different 
from the typical overlap between two coupling vectors belonging to different cells (different output sequence {cr^})- 
Therefore we have to introduce already within the replica symmetric (RS) approximation two different overlap values 
in order to determine the saddle point of (|Tl])( see |7]): 



t» ab 



1 if (aa) = (bp) 
Qi if a — b, a ^ j3 
Qo if a ^ b 



(13) 



In accordance with the above discussion Q\ then denotes the typical overlap within one cell, whereas Qo denotes 
the overlap between different cells. The structure of the conjugated order parameter is analogous, having non-unite 
diagonal elements Q2- 

Plugging this RS ansatz into (g) one realizes that Qo = Qo = always solves the saddle point equations for Qo 
and Qo- This has an obvious physical interpretation: Due to the symmetry of (Q) and therefore of the crowding 
index a(<x) under the transformation (J, er) <-> (—J, — <r) every cell has a "mirror cell" of same size and shape on the 
"opposite side" of the coupling space (see fig.l). Qo = simply reflects this symmetry. It can be explicitly broken 
by introducing a threshold in (jl]). Note that Qo = Qo — means formally that the quenched average over the input 
patterns can be performed as an annealed average. 



III. THE SPHERICAL PERCEPTRON 



A. Replica symmetry 



In the case of the spherical perceptron the coupling space is restricted to the N-dimcnsional hyper-sphere defined 
by the global spherical constraint J 2 = N. In the large- N limit this gives rise to the integral measure 



dA*(j) = n 



dJj 
s/2Tie 



8{N~3 2 ). 



(14) 



Using the replica symmetric ansatz (ft3|) with Qo = Qo = leads to the mass exponent 



r(q) = 



1 

log 2 QuQi,2 



|(Q 2 - 1) + ^-^QiQi - ~ log(Q 2 + (q - i)Qi) - q -^- log(& - Qi) 



log 2 / DtH q 



Qi 



(15) 



where we introduced the abbreviations Dt = dt exp(— i 2 /2)/v27r for the Gaussian measure and H(x) = Dt. As 

is well known for spherical models, the saddle point equations for the conjugated order parameters Q"^ can be solved 
explicitly which in the present case yields 



r(q) 



log 2 



extrr 



i]og(l + («-l)Q 1 ) + 2-J : log(l-Qi)+7log 2 J DtH* U JZ^> 



(16) 
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The order parameter Qi is self-consistently determined by the saddle point equation 

Qi i 



l + (g-l)Qi 2tt 



(17) 



The multifractal spectrum /(a) resulting from a numerical solution of these equations is shown in fig. 2 for various 
loadings 7. 




FIG. 2. Multifractal spectrum f(a) characterizing the cell structure of the coupling space of the spherical perceptron for 
various values of the loading parameter 7 = 0.2, 0.35, 0.5, 1.0, 2.0 (from left to right). The curves end at their maxima because of 
the divergence of the mass exponent r(q) for negative q (corresponding to the dotted parts). Replica symmetry holds between 
the diamonds and the maxima. 

For small values of 7 we find the typical bell-shaped form of f[a). The zeros a m i n (j) specify the RS estimate of 
the largest cell occurring with non-zero probability fl4|| . 

The most frequent cell size corresponds to the maximum of f(a) and is therefore given by 0:0(7) = argmax(/(o)). 
For large N cells of this size dominate the total number of cells exponentially, i.e. a randomly chosen output sequence 
cr will be found with probability 1 in a cell of size cvo- Hence e a °^ (cf. (3)) is the typical volume of couplings 
realizing p = 7./V random input-output mappings as determined within a standard Gardner calculation [jjj. This 
volume becomes zero, i.e. 00(7) — > 00, for 7 — > 2 (cf. fig.l) in accordance with the Gardner result Q. 

In addition we can infer from / (0:0(7)) the typical number of cells as first calculated with the help of geometrical 
methods by Cover Q. For small loading ratios 7 we find /(oo) = 7, he. all 2 lN possible cells (or almost all of them) 
do indeed occur. The storage problem for these 7 values is then solvable with probability one. For 7 > 2 we have 
f{oio) — /(oo) < 7 implying that only an exponentially small fraction of all possible cells can be realized. It is then 
typically impossible to find couplings realizing a randomly generated set of input-output mappings. The multifractal 
analysis of the coupling space hence nicely reconciles the previously complementary approaches to the storage problem 
of the perceptron by Cover and Gardner respectively. From both the analysis of 0:0(7) and of /(oo(7)) one finds the 
well known result a c = 2. 

Although the cells with volume oo are the most frequent ones their joint contribution to the total volume of the 
sphere is negligible. Since 



cr JO 



daexp(N[f(a) — a]) 



(18) 
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a saddle point argument reveals that the cells with size 0:1(7) defined by /'(cci) = 1 dominate the volume. Cells of 
larger size are too rare, those more frequent are too small to compete. Consequently a randomly chosen coupling 
vector J will belong with probability 1 to a cell of size a\ . By the definition (|J) of the cells all other couplings of this 
cell will give the same output for all patterns Therefore e ai ^ is nothing but the volume of the version space of a 
teacher perceptron chosen at random from a uniform probability distribution on the sphere of possible perceptrons. 
From it (or equivalently from Qi(q = 1,7)) one can determine the generalization error as a function of the training 
set size 7 reproducing the results of [ jl5| . 

The main properties of the perceptron can hence be derived from the multifractal spectrum f(a) of the cell size 
distribution in the coupling space. Below we show that the RS ansatz together with the assumption Qo = gives 
valid results for < q < 1. 

There is also a close formal analogy between the calculation of f(a) and the standard Gardner approach with q 
playing the role of the replica number in the Gardner calculation. Since from (|^) we have q — df /da the calculation of 
ao is related to q — > whereas the generalization problem concentrating on ot\ corresponds to q — > 1. These limits of 
the replica number in Gardner calculations are well known to correspond to the storage and generalization problems 
respectively jl6| . 



B. Longitudinal instability of replica symmetry 



The results of the previous paragraph were obtained within the RS approximation and using Qo = 0. The discussion 
of their validity requires a careful determination of the stability of these ansatzes for the different values of q. We 
first discuss the stability with respect to longitudinal fluctuations in Qo, i.e. we search for a RS saddle point solution 
where the symmetry giving rise to Qo = is spontaneously broken. The full replica symmetric saddle point equations 
are of the form 



= 



(l + (q-l)Qi-Qo) 2 

( 
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= 



2tt(1-Qi) 

Qi — Qo 



Dy 



jDt(H 



9-1 



H 



5-In 



exp 



2(1 



V 



fDt(H* 

Qo(i-Qi) 

l + ( q - l)Qi -Qo T (l + (g - l)Qi - Qo) 2 



Hi) 



(19) 



' 2tt 



jDt(H 



q-2 



H q S 2 ) exp <j - 



Dy- 



(y/Ghy+y/Qi-Qat) 2 

(1-Q1) 



jDt(Hl + H q _) 



where 



H± = H ± 



r Qoy + VQi - Qot 



Linearizing these equations in Qo we find that new solutions Qo > bifurcate continuously from Qo = at 

V7 



<Z± = 1± 



Qi(7) 



(20) 



(21) 



(22) 



From (^2]) and ( [T^ ) the two transition points <z±(7) for positive/negative inverse temperature q can be determined 
explicitly. In the range q_ < q < q + , there exists only the solution with Qo = which becomes unstable at the 
transition points. 



C. Divergence of negative moments 

The expression (|l5| ) for the replica symmetric mass exponent has been calculated for positive integer q. The 
continuation to negative values of q gives rise to divergences at 



G 



j = i + (q-i)Qi =0 (23) 
l — yi 

as can be realized from an asymptotic analysis of the integrand in the last term of (|l5|). Because of H(t) oc 
exp(-t 2 / 2 )/V2vri for large t we get an exponential part of this term proportional to exp(— St 2 /2) and the whole 
integral converges only if 5 > 0. For 5 — > we therefore find that r tends to — oo. The global minimum of jl5| ) with 
respect to Q\ is hence no longer given by the saddle point described by (16|l7]) which realizes only a local minimum 
with respect to Q\. There is hence a discontinuous longitudinal transition at q = with Q\ jumping from the solution 
of @ to 1/(1 -q). As a consequence f(a) is not defined for f'(a) < and the curves for /(a) as obtained above by 
using the saddle-point equations are reliable only for q > 0, i.e. for positive slope. The parts corresponding to q < 
are dotted in fig. 2. 

It is tempting to speculate that the observed divergence for negative q is due to the existence of empty cells V(<r) = 
in (||) . In the theory of neural networks (and more generally of classifier systems) the possibility of output sequences 
impossible to implement by the system is related to the Vapnik-Chervonenkis- (VC-) dimension dye of the class of 
networks under consideration |lg|JlS|| . It has been notoriously difficult to determine the VC-dimension of a neural 
network from statistical mechanics calculations since the definition of the VC-dimension involves a supremum over all 
possible pattern sets rather than the average featuring in (^). The above analysis of the instability with respect to 
Qi, Qi for q < reveals that the multifractal formalism in the present form is unfortunately also unable to determine 
dye since the divergence of negative moments of V(<r) occurs for all values of 7. The pattern average performed in 
(||) does not allow to decide whether the observed divergence of r is due to the fact that 7 > dyc/N or is due to 
exceptional pattern realizations that give rise to empty cells also if 7 < dyc/N p0[ . Note in this connection also that 
similar divergences in the theory of multifractals |2l| are related to cells with volume which is non-zero but decreases 
for N — > 00 quicker then exponentially. For the perceptron on the other hand it is known that empty cells exist for 
all values of N H. 



D. Transversal instability of replica symmetry and percolation 

In addition to the longitudinal instabilities discussed in the previous paragraph there is the possibility of a transversal 
instability invalidating the RS ansatz p5| which we study now. The replica symmetric ansatz (fl3|) is formally similar 



to a one-step replica symmetry broken solution (1RSB) of a standard replica calculations |17|. It is advantageous to 
use the results and the notation given in (2^]. After some lengthy calculations we arrive at 4 different eigenvalues 
corresponding to the replicon modes denoted by (0,1,1), (1,2,2), (0,2,2) and (0,2,1) in |2^]. We give here only the 
result for the (0,l,l)-mode which is found to be the first to become unstable. 

1_ l( g _l)2Q3 



It vanishes exactly at the two points calculated in (|22|) describing the instability with respect to longitudinal Qq- 
fluctuations. Similar to the SK-model 0] in zero field the longitudinal and transversal instability of the RS-solution 
occur hence for the same temperature 1/q p5|] . Due to the divergences for q < only q + is of further relevance. The 
AT-points are therefore determined by 9+ (7) which are marked in fig. 2 by the diamonds. 

The eigenvalue A(0, 1,1) describes fluctuations in that part of the overlap matrix having only Qo-entries, i.e. 
corresponding to the overlaps between different cells. This is reasonable since for the spherical perceptron the cells 
themselves are known to be convex. No RSB is hence expected to be necessary to describe the structure of a single 
cell [jlj . The instability to RSB found above concerns the distribution of overlaps between different cells which must 
now be characterized by two parameters. The smaller one remains equal to zero reflecting still the symmetry of (Q). 
The other one is larger than zero and describes the formation of clusters made of cells of identical size. 

In order to interpret the RSB transition in physical terms we allude again to the analogy with the SK-model. There 
the analogous instability corresponds to broken ergodicity, i.e. to the fact that not all parts of the phase space can 
be reached from a given initial condition. In the perceptron problem single spin flips in the output sequence <x are 
equivalent to hops between neighboring cells in the coupling space. The breakdown of Qq — at q+ hence signals that 
starting in a cell of a size corresponding to q +1 i.e. starting with a spin configuration <x with energy a(cr) corresponding 
to q + it becomes impossible to reach the "mirror cell" by hops using only cells of the same or larger sizes, i.e. via spin 
configurations er with the same or smaller energy. Since the relative number of larger cells is exponentially small we 
can interpret the observed breaking of RS as a percolation transition in the infinite dimensional space of couplings. 
For < q < q+ the cells of size e a ^ percolate in coupling space in the sense that they can all be reached from each 
other by entering only cells of the same size. For q > q + this is no longer true and the cells form clusters isolated 
from each other. 
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IV. THE ISING PERCEPTRON 



A. Replica symmetry 



In the Ising pcrccptron the entries of the coupling vector are restricted to J; = ±1, i = 1, . . . , N, the full coupling 
space is hence given by the 2 N corners of a iV-dimensional hypercube. The cells are therefore represented by discrete 
sets, their probability measure is given by the number of elements multiplied with 2~ N . Thus, the coupling space 
measure is to be modified according to 



E 



(25) 



Following the general procedure described in the section II we arrive at 



t(q) = q + lim — ; — - extr,„ Q ^x ■ - 

w y ri->onlog2 (Q a £UQ a l T) I 2 



\ E Q:fo:f~iogG 1 ((Q a t))- 7 iogG ((Q:f)) 



(26) 



(act)jt(bf3) 



with 



r°° r A<r aa 1 

Go((Q a j)) = E / Il dAaQ / II %r exp yE xaaxaa ° a - 2 E * aa * hfi Q a l 

a a a a aa ( aa aba/3 



(27) 



and 



Gi((Q a J)) 



(28) 



Contrary to the spherical case the conjugated parameters Q a ° can not be eliminated analytically. The replica 
symmetric expressions are obtained by introducing the saddle point structure (|l^) for both the overlaps and their 
conjugates. Starting again with the solution Qq = Qo = we get for the mass exponent 



T~(q) = extr^, a 
log 2 Qi.Qi 



qQi 



^(1 + (q - l)Qi) - log / Dt cosh«( yJQib) 



-7 log 2 DtH q 




(29) 



The extremization in this equation is again somewhat subtle ||. There are two, qualitatively different possibilities: 
(i) The first one is given by Q\ — 1 and Qi = oo and hence lies at the boundary of allowed values of the saddle-point 
parameters. It can be studied analytically and leads to r(q) = q — 1. The corresponding multifractal spectrum is 
given by a single point / = a = 1. No dependence on 7 remains. The value a = 1 describes cells containing just a 
single coupling vector in accordance with Q% = 1. Their total number is of order e~f = 2 N , and hence they form a 
macroscopic part of the cell number as well as of the total coupling space volume. Since the total number of cells is 
2 lN this solution can exist for 7 > 1 only, (ii) The second solution solves the saddle-point equations 



Qi 



J Dt cosh 9 ~ 2 (yQit) sinh 2 (ygii) 
/ Dt cosh^y^Oif) 

7 ftt^P 



q-2 



2tt(1-Qi) 



1-Q1 



(30) 



It lies inside the intervals for the parameters and is to be determined numerically. It exists for all q if 7 < 1 and 
disappears for fixed 7 > 1 at a sufficiently negative q. 
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A numerical comparison of the corresponding local maximums/minimums of r leads to the following scenario: For 
7 < 1 solution (m) always gives the global extremum. For 7 > 1 this is only the case for q > qdisdl)- At this 
threshold the extremum (i) becomes the global one by a discontinuous transition. The occurrence of such transitions 
is the trademark of neural networks with Ising couplings |2(]J|]. The smooth curves of /(a) then terminate and are 
completed by a single point at (1,1) (see fig. 3). Hence for 7 > 1 the spectrum is nonzero only in a certain region 
oimin < a < a max < 1 and at the isolated point at a = 1. 




a 



FIG. 3. Multifractal spectrum f(a) describing the cell structure of the coupling space of the Ising perceptron with loading 
ratios 7 = 0.2,0.4,0.833, 1.245, 1.4 (from left to right). The curves end at their maxima due to the divergence of the mass 
exponent r(q) for negative q. Between the diamonds and the maxima RS holds. The triangle denotes the discontinuous 
transition to Qi = 1. The isolated point (1,1) is marked by the square. 

The general form of the multifractal spectrum resembles that of the spherical perceptron. In fact, for 7 < 0.25 
the curves almost coincide. This was to be expected because the cells are still relatively large and do not "feel" the 
discreteness of the Ising couplings. For larger 7, however, it becomes decisive that in the Ising perceptron the cell 
sizes are restricted to a < 1. All values a > 1 correspond to empty cells. 

The storage capacity is therefore not given by ckq — > 00 as in the case of the spherical perceptron but is determined 
by 00(7) = argmax(/(a, 7)) = 1. As can be seen from fig. 3 this holds for 7 C = 0.833 the well known result obtained 
in Pq] , In fact 00(7) = 1 is equivalent to the zero-entropy condition frequently used for neural networks with discrete 
couplings. Let us define the entropy 

s= lim ^ ((log AO) (31) 

where Af denotes the number of couplings that can implement a mapping between 7 A random inputs and outputs. 
Therefore Af is related to the size of the typical cell and from eqs.(||,^5|) one finds s = (1 — ceo) log 2. Hence s = is 
equivalent to ceo = 1. 

Similarly the generalization behaviour of the Ising perceptron differs from that of the spherical one. There is a 
well-known discontinuous transition to perfect generalization at j g — 1.245 |27]]. It shows up in the multifractal 
spectrum f(a, 7) as the point where ai(j g ) = 1 with a\ again defined by f{a.\) — 1. At this value of the loading 
parameter the discontinuous transition to Qi — 1 occurs and the coupling space becomes dominated by cells with 
exactly one element. This is the intuitive reason for the transition to perfect generalization; there is only one coupling 
vector left that performs perfectly on the training set: the teacher herself. 







B. Continuous replica symmetry breaking and percolation 



Similar to section 2 we have analyzed the longitudinal and transversal stability of the RS solution with Qq = Qq = 
by calculating the relevant eigenvalues of the fluctuation matrix. The results are qualitatively the same. Using the 
linearization of the complete replica symmetric saddle point equations at Qq = Qq — we find again a new solution 
with Qo,Qo > emerging continuously at the two values q± satisfying 

x/7 = ±Qi(<Z± ~ 1)(1 " Qi)(l + («± - l)Qi) (32) 

The analysis of the transversal fluctuations reveals that the first mode to become unstable is again the replicon 
mode (0,1,1). Due to the existence of two order parameter matrices the analysis is now more involved. Following the 
argumentation in Q we first determine the eigenvalues in the two building blocks of the fluctuation matrix and find 
for the (0,l,l)-eigenvalue of the overlap fluctuations 

AW)(0, 1, 1) = -7-^ - 1) 2 (1 - QifQl (33) 

and for the conjugated matrix 

AW)(0,1,1) = -(1 + ( (Z -1)Q 1 ) 2 . (34) 
The total fluctuation matrix for this replicon mode is then given by 



AW)(0,1,1) 1 

1 A** (0,1,1) 



(35) 



and the breakdown of RS is signaled by a change of the sign of its determinant. From (p3|),(|34|),and ( p5| ) we find that 
the local transversal instability occurs again at the same values <?±(7) given by ( |32| ) for which the RS solution with 
Qo = Qo = becomes longitudinally unstable. 

In fig. 3 the breakdown of the local stability of RS is again marked by the diamonds. Inside the interval (a(g + ), ao) 
RS is locally stable. As discussed already for the spherical case, in this region the solutions are symmetric under the 
reflection symmetry of the original system (|l|) and the cells of every fixed crowding index a € (a(q+), (Xq) percolate 
in coupling space. Outside this interval the overlaps between different cells must again be described by two (or more) 
order parameters. The smaller one vanishes, and still reflects the symmetry of ([j]). The other one takes a value in 
(0, Qi) with Qi being the overlap within one single cell and describes the size of the connected clusters remaining 
below the percolation threshold. 



C. Divergence of negative moments 

For q < there occurs an analogous divergence of r for small 5 defined by (^3|) as in the spherical case which 
gives rise to a similar discontinuous transition with respect to Qi. As a result the /(a)-curves do not continue into 
regions of negative slope for any 7. It is hence again impossible to infer the (still unknown p8[ ) VC-dimension of the 
Ising-perceptron from the multifractal analysis. Note that the only value that could in principle be obtained from a 
statistical mechanics analysis is what is called the typical VC-dimension that gives the maximal pattern set size for 
which typically no empty cells occur |2^]. Numerical investigations suggest that this value is equal to N/2 po| , p9| . 



D. Discontinuous replica symmetry breaking 

Contrary to the case of the spherical perceptron there is an inconsistency even within the region of local stability 
of the RS ansatz. For .833 < 7 < 1.245 the multifractal spectrum f(a) continues to values a > 1 corresponding to 
the unphysical region of cells having less then one but more then zero elements. 

We therefore expect a discontinuous transition to RSB already in the region of local stability of RS. Due to the 
fact that a single cell is not necessarily connected for the Ising perceptron, this transition is likely to take place inside 
the blocks describing the overlaps within a cell. The global reflection symmetry remains unbroken and therefore the 
typical overlaps between two cells stay zero. We can hence calculate the annealed average ((Z)) of the partition 
function. After a standard calculation we find the following one-step RSB (1RSB) result for the mass exponent 
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ARSB 



(?) 



l g2 extr ™.Qi.Q2,Qi,Q2 



^q(q - ro)QiQi + |Q 2 (1 + (m - 1)Q 2 ) 



log / Dz! 



Dz 2 cosh m (\/Q 1 z 1 + v/Qi -Q222) 



(36) 



-7 log 2 / £>ti 



where Q2 is the order parameter inside the m x m diagonal blocks of the overlap matrix and Q\ is the entry outside 
these blocks. Qi,2 are the corresponding conjugated quantities. A similar expression was obtained in 0. 
Guided by previous experience fl26| we look for an extremum with Q2 = 1 and Q2 — 00. One then finds 



1RSB 



(q) = extr r , 



q{ l-h+r RS (-) 
m m 



(37) 



where we have used the RS mass exponent t rs from (29). The saddle point equation with respect to m is then given 
by 



1 



dq m 



(38) 



Comparing the values of t rs and t 1rsb one finds that for 7 > .833 there is indeed a discontinuous transition to this 
1RSB solution when a gets larger than 1. Crossing this point r(q) becomes proportional to q implying that /(a) 
stops at a = 1. This removes inconsistency noted above. For .833 < 7 < 1.245 the cells contributing most to the 
total volume (the ai-cells) contain exponentially many couplings. The majority of cells, however, comprise only sub- 
exponentially many couplings. This situation is correctly described by the 1RSB solution with Q\ < 1 characterizing 
the ai-cells and Q2 = 1 characterizing the typical ones. With increasing 7 the ai-cells shrink and the typical cells 
disappear. At 7 = 1.245 we have a± = 1 and correspondingly Q\ — 1. At this point r is again given by the minimum 
in ( |2q ) since q > 1. Therefore the 1RSB solution has to be rejected and the discontinuous transition disappears. The 
1RSB solution also becomes intrinsically inconsistent since there is "no room left" for a Q2 with Qy < Q2 < 1. For 
7 > 1.245 the 1RSB solution finds its natural continuation in the RS solution with Qi = 1 that gives rise to the gap 
in the f(a) spectrum as discussed in the first paragraph of this section. 



E. Numerical results 



For the Ising perceptron the phase space is discrete and the analytical results discussed above can be checked by 
numerical enumerations over all the possible 2 N coupling vectors Ji = ±1. Although these techniques are naturally 
confined to rather low values of N it is interesting to see whether the asymptotic behaviour already shows up in small 
samples. We have performed enumerations for values of N between 10 and 30 according to the following prescription. 
We first generate p patterns at random from a Gaussian distribution with zero mean and unit variance. As in related 
studies 1 31 (| we choose Gaussian patterns because they show less pronounced finite size fluctuations. Next we use 



the Gray Code [ |32| to run through all coupling vectors J and determine the corresponding output strings. Finally we 
determine the size of the cells by counting the multiplicity of the occurring outputs and compile a histogram of cell 
sizes in a double logarithmic scale. The results are averaged over 10 4 (for N = 10) to 10 (for N = 30) realizations 
of the random patterns. Although the main reason for the lesser number of realizations used for large N was limited 
computer time it became quite clear in the simulations that the sample-to-sample fluctuations for large N are due to 
self-averaging substantially smaller than for small N. The results of the enumerations together with the corresponding 
analytical results already displayed in fig. 3 are shown in fig. 4. It is at first surprising that the histograms lie always 
above the analytical curves. However, for small 7 we have 

2~< N =f*da2 N f^ 

~ 2"/(«°) daexp(f log2 f"(a )(a ~ a f) (39) 

~* V iVlo g 2|/"(a )| 

giving rise to 
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(40) 



Hence the maximum of the histograms for hnite N converges to the asymptotic value /(ao) = 7 for N — ► 00 from 
above. In fact in the inset of fig. 4 we have compared the finite size scaling predicted by ( ]40| ) with enumerations results 
for /(ao) at 7 = .5. The agreement is very good. 



1.0 




1.0 



FIG. 4. Exact enumeration results for the multifractal spectrum /(a) 

of the Ising perceptron for p — 6,N — 30; p = 12, N — 30 and p — 20, N = 24 (from left to right). The dotted lines are 
the analytical results of fig. 3 for 7 = 0.2; 0.4 and 0.833 respectively. The inset shows a finite size scaling for /(ao) at 7 = 0.5. 
The diamonds are enumeration results whereas the line is given by eq.([irj|) with |/"(cko)| estimated from the numerical data. 
The rms fluctuations of the enumeration results are smaller than the symbol size. The asymptotic value for /(ao) is 0.5. 



V. SUMMARY 



In this paper, we have presented a multifractal analysis of the coupling space of the single-layer perceptron with 
continuous and Ising couplings. This has been done by characterizing the random partition of the coupling space into 
different cells corresponding to different output sequences on the same fixed set of random input vectors. This picture 
allowed us to refine the standard Gardner analysis and, moreover, to unify the different approaches to the storage 
problem of Gardner ffl and Cover and the generalization problem within one consistent picture. The different 
questions are related to different fractal subsets of the coupling space: The cells with the most frequent size describe 
the storage problem, those dominating the total volume are related to the generalization ability for a randomly drawn 
teacher. 

We have shown that the storage and the generalization problem can always be analyzed within the region where 
replica symmetry is locally stable. Moreover we have demonstrated that the most important part of the multifractal 
spectrum can be determined within an annealed calculation with respect to the input pattern distribution. This is 
not only an important technical advantage but may also smooth the way for mathematically rigorous investigations 
of these problems. 

Replica symmetry must be broken if one aims at describing comparatively large and therefore rare cells. Despite 
the symmetry of the coupling space under point reflection at the origin these cells no longer percolate in the sense 
that it is impossible to reach the corresponding "mirror" cell without entering cells of smaller size. This clustering of 
large cells is described by the higher order parameters of a solution with broken replica symmetry. 

Finally we note that the central procedure in our calculations is the determination of positive integer moments V q 
of the distribution of phase space volumes and the continuation of the result to real q [ fl3[ . In doing so we encountered 
divergences for all q < 0. These are probably due to the existence of empty cells V = 0. We therefore hope that an 
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appropriately modified formalism describing the meta-stable state that occurs for q < might be able to yield also 
results for the VC-dimension of neural networks. 
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